Optimizing Queries with Materialized Views
ثبت نشده
چکیده
While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialized views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and efficient solution. Our solution has the desirable property that it is a simple generalization of the traditional query optimization algorithm. 1 I n t r o d u c t i o n The idea of using materialized views for the benefit of improved query processing has been proposed in the literature more than a decade ago. In this context, problems such as definition of views, composition of views, maintenance of views [BC79, KP81, SI84, BLT86, CW91, Rou91, GMS93] have been researched but one topic has been conspicuous by its absence. This concerns the problem of the judicious use of materialized views in answering a query. It may seem that materialized views should be used to evaluate a query whenever they are applicable. In fact, blind applications of materialized views may result in significantly worse plans compared to alternative plans that do not use any materialized views. Whether the use of materialized views will result in a better or a worse plan depends on the query and the statistical properties of the database. Since queries are often generated using tools and since the statistical property of databases are time-varying, it should be the responsibility of the optimizer to consider the alternative execution plans and to make a cost-based decision whether or not to use materialized views to answer a given query on a given database. Such enumeration of the possible alternatives by the optimizer *Author 's current address: Momferatoy 71, Athens, Greece ? Author ' s current address: IBM Almaden Research Center, San Jose must be syntax independent and efficient. By syntax independent, we mean that the set of alternatives enumerated by the optimizer (and hence the choice of the optimal execution plan) should not depend on whether or not the query explicitly references materialized views. Thus, the optimizer must be capable of considering the alternatives implied by materialized views. In particular, a materialized view may need to be considered even if the view is not directly applicable (i.e., there is no subexpression in the query that syntactically matches the view). Also, more than one materialized views may be relevant for the given query. In such cases, the optimizer must avoid incorrect alternatives where mutually exclusive views are used together while considering use of mutually compatible views. The following examples illustrate the issues in optimizing queries with materialized views. The first example emphasizes the importance of syntax independence and also shows that sometimes use of materialized views may result in worse plans. The second example illustrates the subtleties in syntax independent enumeration discussed above. The examples use a database containing an employee relation Emp(name, duo, s a l , age) and a department relat i o n D e p t ( d n o , s i z e , l o c ) . E x a m p l e 1 .1 : Let there be a materialized view Executive(name, duo, s a l ) that contains all employees whose salary is greater than 200k. Consider the query that asks for employees (and their department number) whose salary is greater than 200k and who are in the department with duo = 0419. If the relation Emp has no index on duo, then it is better to access the materialized view E x e c u t i v e even though the user presents a query which does not refer to the materialized view Execu t ive . This example illustrates that the use of a materialized view can be beneficial even if a query does not refer to the materialized view explicitly. On the other hand, it may be possible to obtain a cheaper plan by not using a materialized view even if the query does reference the view explicitly. Consider 1063-6382/95 $4.00 © 1995 IEEE 190 the query that asks for all executives in dno = 419. This query explicitly refers to the materialized view Execu t ive . However, if there is an index on the dno attribute of the relation Emp, then it may be better to expand the view definition in order to use the index on dno attribute of the relation Fanp. Thus, the choice between a materialized view and a view expansion must be cost-based and syntax-independent. | E x a m p l e 1.2: The purpose of this example is to illustrate the nature of enumeration of alternatives that arise when materialized views are present. Consider the query which asks for employees who earn more than 220k. Although the materialized view for E x e c u t i v e does not syntactically match any subexpression of the query, it could still be used to answer the above query by retaining the selection condition on salary. Next, we illustrate a case of mutually compatible use of materialized views. Consider the query that asks for employees who earn more than 200k and who have been working in departments of s i z e > 30 employees. If there is a materialized view Large..Dept (d.no, 1oc) containing all departments (with their location) where number of employees exceed 30, then the latter may be used along with E x e c u t i v e to answer the query. Finally, there are cases where uses of two materialized views are incompatible. Assume that a materialized view Loc..Erap(name, s i z e , l o c ) is maintained that records for each employee the location of her work. If the query asks for all employees who work in large departments located in San Francisco, then each of Loc_Emp and Large_Dept materialized views help generate alternative executions. But, uses of these two materialized views are mutually exclusive, i.e., they cannot be used together to answer a query. | The presence of materialized views and the requirement of syntax-independent optimization has the effect of increasing the space of alternative executions available to the optimizer since the latter must consider use and non-use of the materialized views. Since the query optimization algorithms take time exponential in the size of the queries, we must also ensure that the above enumeration of alternatives is done efficiently so as to minimize the increase in optimization time. Furthermore, we must also recognize the reality that for our proposal to be practical and immediately useful, it is imperative that our proposal be a generalization of the widely accepted optimization algorithm [SAC+79]. In this paper, we show how syntax-independent enumeration of alternative executions can be done efficiently. Our proposal constitutes a simple extension to the cost-based dynamic programming algorithm of [SAC+79] and ensures the optimality of the chosen plan over the extended execution space. The simplicity of our extension makes our solution practically acceptable. Yet, our approach proves to be significantly better than any simple-minded solution to the problem that may be adopted (See Section 4). For the rest of the paper, we assume that the query as well as the materialized views are conjunctive queries, i.e., the Select-Project-Join expressions such that the Where clause consists of a conjunction of simple predicates (e.g., =, <, >) only. Thus, the query has no aggregates or group-by clause. We will use the domain-calculus notation lUll89] to express conjunctive queries. However, our results extend to any Select-Project-Join queries and extensions beyond the above class are also considered in the full version of the paper [CKPS94]. The rest of the paper is organized as follows. We begin with an overview of our approach. In Section 3, we show how the equivalent queries may be formulated from the given query and the materialized views. In Section 4, we present the algorithm for join enumeration. We also contrast the efficiency of our algorithm with the existing approaches and present an experimental study. In Section 5, we discuss further generalizations of our approach. Section 6 mentions related work. 2 O v e r v i e w o f O u r A p p r o a c h In traditional query processing systems, references to views in a query are expanded by their definitions, resulting in a query that has only base tables. Relational systems that support views can do such unfolding. However, the presence of materialized views provide the opportunity to fold one or more of the subexpressions in the query into references to materialized views, thus generating additional alternatives to the unfolded query. Therefore, we must convey to the optimizer the information that enables it to fold the subexpressions corresponding to the materialized views. For every materialized view V, we will define a onelevel rule as follows. The left-hand side of the onelevel rule is a conjunctive query (body of the view definition) L and the right-hand side of the rule is a single literal (name of the view). We represent the rule as:
منابع مشابه
Optimizing Queries with Materialized Views
While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialized views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and eecient solution. Our solution has the desirable property that it is a simple generalization of the traditional query optimization...
متن کاملImprovement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation
A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading ...
متن کاملDesigning Views to Efficiently Answer Real SQL Queries
The problem of optimizing queries in the presence of materialized views and the related view-design problem have recently attracted a lot of attention. Significant research results have been reported, and materialized views are increasingly used in query evaluation in commercial data-management systems. At the same time, most results in the literature assume set-theoretic semantics, whereas SQL...
متن کاملOptimizing XML queries: Bitmapped materialized views vs. indexes
Optimizing queries using materialized views has not been addressed adequately in the context of XML due to the many limitations associated with the definition and usability of materialized views in traditional XML query evaluation models. In this paper, we address the XML query optimization problem using materialized views in the framework of the inverted lists evaluation model which has been e...
متن کاملDesigning Views to Answer Queries under Set, Bag,and BagSet Semantics
A lot of work has been done recently on optimizing queries in the presence of materialized views. However the majority of the research assumes set-theoretic semantics while SQL queries have bag-theoretic semantics (duplicates are not eliminated unless explicitly requested). This paper presents results on designing views to answer queries in relational databases under set, bag and bag-set semant...
متن کاملOptimizing Aggregate SPARQL Queries Using Materialized RDF Views
During recent years, more and more data has been published as native RDF datasets. In this setup, both the size of the datasets and the need to process aggregate queries represent challenges for standard SPARQL query processing techniques. To overcome these limitations, materialized views can be created and used as a source of precomputed partial results during query processing. However, materi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000